Abstract: BotometerLite is advertised as a lightweight bot detector that improves scalability by focusing on only user profile information; furthermore, BotometerLite claims that using fewer features only entails a small compromise in individual accuracy. We test the validity of this claim by comparing Botometer with BotometerLite bot likelihood scores for 10,000 randomly sampled users. BotometerLite scores varied drastically from Botometer scores.

Introduction

Botometer is one of the most popular bot detection tools used in social science Rauchfleisch and Kaiser (2020). Botometer was initially launched in May 2014 and BotometerLite was released in September 2020. The training and performance evaluation of BotometerLite is described in “Scalable and Generalizable Social Bot Detection through Data Selection” Yang et al. (2020).

Rauchfleisch and Kaiser (2020) found Botometer scores are imprecise at estimating bots, especially in a different language, and prone to variance over time a high number of human users as bots and vice versa.

In this study, we seek to answer the following questions:

  • How similar are Botometer and BotometerLite ratings?
  • Is BotometerLite effective at identifying specific types of bots (e.g., spammers, fake followers, etc.)?
  • Can BotometerLite be used as an triage tool to identify a subset of accounts that require more extensive evaluation via Botometer?
  • Do some topics have more assessed bots than others? Are any significantly more than a random sample of twitter users?

Bot Type Scores

Bot scores describe how much an account acts like a specific kind of bot. https://botometer.osome.iu.edu/faq

  • Astroturf: manually labeled political bots and accounts involved in follow trains that systematically delete content
  • Fake follower: bots purchased to increase follower counts
  • Financial: bots that post using cashtags
  • Self declared: bots from botwiki.org
  • Spammer: accounts labeled as spambots from several datasets
  • Other: miscellaneous other bots obtained from manual annotation, user feedback, etc.

Complete Automation Probability describes the probability, according to the Botometer model, that an account with this score or greater is at bot.

Methodology

  1. Randomly sample 17,000 users from GWU’s Tweet Sets Library in teh following collections:
    • Coronavirus: collected March 3 2020 to June 9, 2020
    • 2016 election: collected July 13, 2016 to November 10, 2016
    • News outlets: collected August 4, 2016 to May 12, 2020
    • Charlottesville: August, 2017
    • Random sample drawn from twitter API?
  2. Collect Botometer and BotometerLite scores.
  3. Calculate correlation between scores.
## [1] 1.266755e+18

Results

Raw English Scores

BotometerLite is most similar to the Botometer fake follower and spammer scores with \(R^2\) values of 0.394 and 0.334, respectively. Hence, if Botometer scores are accurate, BotometerLite may be somewhat effective at identifying some fake followers and spammers.

The pearson correlation matrix (\(R^2\) values are the square of the values of this matrix) also shows the scores are weakly correlated.

#

CAP Distribution

27% of the sample users have a Complete Automation Probability (CAP) of 0.75 or greater. Hence, if we apply a threshold of 0.75 to annotate bots in the data, roughly 1 out of 4 users in our sample would labeled as a bot. This seems highly unlikely.

Conclusion

Future work for course project:

  • Update introduction to include other articles that have critiqued Botometer
  • Replicate results of Indiana University BotometerLite paper (Train a classifier to predict manually labeled bots and compare with BotometerLite)
  • Post code to github repo

Questions:

  • What statistical tests should we do to provide evidence Botometer and BotometerLite produce different results? t-test for difference of means? F-test for difference of variance? Hotelling test across all bot scores?
  • Should we look at just the scores derived from english-speaking accounts or also include the universal scores?
  • What visualizations should we use? tSNE separating accounts with CAP > 0.8 from those with CAP < 0.8?
  • Is this study worth of pursuing for publication on Medium? or a peer-reviewed journal?

References

Rauchfleisch, Adrian, and Jonas Kaiser. 2020. “The False Positive Problem of Automatic Bot Detection in Social Science Research.” Berkman Klein Center Research Publication, nos. 2020-3.

Yang, Kai-Cheng, Onur Varol, Pik-Mai Hui, and Filippo Menczer. 2020. “Scalable and Generalizable Social Bot Detection Through Data Selection.” In Proceedings of the Aaai Conference on Artificial Intelligence, 34:1096–1103. 01.